A Statistical Exploration of Himalayan Expeditions

INFO 526 Final Project Proposal

Analysis of Himalayan expeditions from 2020–2024, exploring the factors that influence success, failure, and safety outcomes.

Author
Affiliation

Kuthalaraja-Macdonald

School of Information, University of Arizona

Code
#load and install essential libraries
if (!require("pacman")) 
  install.packages("pacman")

pacman::p_load(
  tidyverse,    # Core tidyverse packages
  here,         # File path management
  ggrepel,      # Smart text labels
  ggthemes,     # Extra themes for ggplot2
  scales,       # Better axis formatting
  waffle,       # Waffle charts
  glue,         # String interpolation
  openintro,    # Datasets
  patchwork,    # Combine ggplots
  tidytuesdayR, #Tidyaccess
  GGally
)


# Set default theme for ggplot2
ggplot2::theme_set(ggplot2::theme_minimal(base_size = 14))

# Global output width
options(width = 65)

# Global knitr options
knitr::opts_chunk$set(
  fig.width = 7,        # 7" width
  fig.asp = 0.618,      # Golden ratio aesthetic
  fig.retina = 3,       # dpi multiplier for displaying HTML output on retina
  fig.align = "center", # Center align figures
  dpi = 300             # Higher dpi, sharper image
)

Dataset

Code
# Option 1: tidytuesdayR package 
## install.packages("tidytuesdayR")

tuesdata <- tidytuesdayR::tt_load(2025, week = 3)

exped_tidy <- tuesdata$exped_tidy
peaks_tidy <- tuesdata$peaks_tidy

# Save as CSVs
write_csv(exped_tidy, here::here("data", "exped_tidy.csv"))
write_csv(peaks_tidy, here::here("data", "peaks_tidy.csv"))
  1. A brief description of your dataset including its provenance, dimensions, etc. as well as the reason why you chose this dataset.

The Himalayan Database was created to continue the legacy of Elizabeth Hawley, a journalist who spent decades documenting Himalayan expeditions. This procured version is provided on TidyTuesday GitHub repository and has been amended to the years 2020 - 2024 to manage file size.

There are two main datasets included: exped_tidy, which contains details of expeditions. And peaks_tidy, which includes data about the Himalayan Peaks.

Code
# Load TidyTuesday data inline
tuesdata <- tidytuesdayR::tt_load(2025, week = 3)
exped_tidy <- tuesdata$exped_tidy
peaks_tidy <- tuesdata$peaks_tidy
Code
# Dimensions of each tibble
dim(exped_tidy)
[1] 882  69
Code
dim(peaks_tidy)
[1] 480  29

Variables from exped_tidy

Variable Description Your Notes
TERMREASON_FACTOR Reason the expedition ended Outcome variable (success/failure) Also includes injury
SEASON_FACTOR Season of the expedition Obviously winter is more hazardous
O2USED Whether supplemental oxygen was used None
MEMBERS Number of expedition team members Team size
HIRDMEMB Number of hired staff None
COMRTE Indicates if expedition was commercially organized Commercial/non-commercial
YEAR Year of expedition 2020–2024 Dates 2020–2024
DEATHS Number of deaths during expedition Safety outcome
FUNDING_TYPE Commercial, Private, etc. Derived variable to filter for funding type Commercial vs Private

Variables from peaks_tidy

Variable Description Your Notes
HEIGHTM Elevation of the peak in meters High elevation might drive both risk and cost
HIMAL_FACTOR Name of the mountain None
REGION_FACTOR Broader geographic region grouping peaks Geographic variability in danger and cost
PSTATUS_FACTOR Peak status climbed or unclimbed Less frequently climbed peaks possibly more dangerous

We chose this dataset because it provides multiple opportunities to explore the intersection of geography, climate, and human decision-making under extreme conditions. It supports both modeling and visualization of expedition risk, success, and outcomes.

Questions

The two questions you want to answer.

Question 1. What factors contribute to the success or failure of a summit?

Question 2. Does an expeditions funding affect safety outcomes?

Analysis plan

  • A plan for answering each of the questions including the variables involved, variables to be created (if any), external data to be merged in (if any).

Question 1: What factors contribute to the success or failure of a summit?

We will analyze expedition outcomes using both expedition and peak variables. “Success” will be defined based on the TERMREASON_FACTOR, with specific values representing a successful summit (e.g., “Success (main peak)”, “Success (claimed)”). The aim is to model how geography, team size, oxygen use, and other factors relate to the likelihood of success.

We will filter by success and explore facets of the data. Below is an example of how we might focus our attention on correlated variables.

Code
# Select only numeric variables
exped_numeric <- exped_tidy |> select(where(is.numeric))

ggcorr(exped_numeric,
  label = TRUE,
  label_alpha = TRUE,
  label_size = 3,
  hjust = 1,
  layout.exp = 1
) +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1, size = 8),
    axis.text.y = element_text(size = 8),
    plot.title = element_text(face = "bold", size = 16),
    plot.subtitle = element_text(size = 12),
    plot.caption = element_text(size = 10)
  ) +
  
  labs(
    title = "Correlation Matrix of Expedition Variables",
    subtitle = "numeric variables only*",
    caption = "Source: Himalayan Database (2020–2024)"
  )

Variables involved:

These are some of the variables that we think might be promising. Note that the correlation matrix above is only numeric and does not include other data types. Neither the list or the correlation plot is all inclusive.

  • From exped_tidy:
    TERMREASON_FACTORSEASON_FACTORO2USEDMEMBERSHIRDMEMBCOMRTEYEAR

  • From peaks_tidy:
    HEIGHTMHIMAL_FACTORREGION_FACTORPSTATUS_FACTOR

Variables to be created:

  • SUMMIT_SUCCESS FUNDING_TYPE SUMMIT_SUCCESS will be composite variable, possibly grouped by country/team etc.

This will filter our data for further analysis.

Question 2. Does an expeditions funding affect safety outcomes?

The analysis will be very similar for this question. I suspect there will be some overlap, funding and success rate etc. We will use many of the same variables, but more specifically, FUNDING_TYPE.

NOTE: Add “real world” examples of expedition success or failure to make analysis relatable.